A Semi - Automatic System for the Syllabification and Stress Assignment
نویسندگان
چکیده
This Master's Thesis concerns research in the automatic analysis of the sub-lexical structure of English words. Sub-lexical structure includes linguistic categories such as syllabification, stress, phonemic representation, phonetics, and spelling. This information could be very useful in all sorts of speech applications, including duration modeling and speech recognition. ANGIE is a system that can parse words, given either their phonetic or orthographic representation, into a common hierarchical framework with the categories mentioned above. A new feature enforcing morphological constraints has recently been added to this paradigm. We define "morphs" to be somewhat like syllable units of a word, but each of them are tagged morphologically, and associated with both an orthographic sequence and a phonemic representation. Each word is represented as concatenations of these morphs, which then encode both the orthography and the phonemics of the word. This thesis defines a procedure to semi-automatically derive a sub-lexical representation of new words in terms of these morphs, using ANGIE's hierarchical framework. One distinctive characteristic of this procedure is that both the phonetics and the spelling information are utilized. The procedure is developed using several corpora. When this procedure is used to derive the sub-lexical representations, some words will fail, either because the word is rejected by the hierarchical framework, or a morph needed to transcribe the word is missing. The words that successfully obtain morphological decompositions are used to evaluate the coverage and accuracy of the existing procedure. The words that fail to be represented are a valuable resource because they provide new information about the sub-lexical structure of English. This new information can be incorporated into our procedure to improve its coverage and accuracy. Thesis Supervisor: Dr. Stephanie Seneff Title: Principal Research Scientist Thesis Co-Supervisor: Dr. Helen Meng Title: Research Scientist
منابع مشابه
Automatic methods for lexical stress assignment and syllabification
Improvements in automatic lexical stress assignment and syllabification can increase the quality of text-to-speech synthesis as well as decrease the memory requirements for dictionaries. Several methods were evaluated. Machine-learning based methods are preferred since they easily adapt to multiple languages. For stress prediction, encouraging results were obtain by combining a decision tree ap...
متن کاملPhonological Processing for Urdu Text to Speech System
Determining and modeling phonological phenomena is necessary to generate speech from textual input. These phenomena include letter to sound conversion, syllabification, sound change, stress assignment and intonation assignment. This paper presents work on Urdu phonological processes and provides algorithms to convert textual input into phonologically annotated output, required for Urdu text-to-...
متن کاملAutomatic word stress marking and syllabification for Catalan TTS
Stress and syllabification are essential attributes for several components in text-to speech (TTS) systems. They are responsible for improving grapheme-to-phoneme conversion rules and for enhancing the synthetic intelligibility, since stress and syllable are key units in prosody prediction. This paper presents three linguistically rule-based automatic algorithms for Catalan text-to-speech conve...
متن کاملData-driven approaches for automatic detection of syllable boundaries
Syllabification is an essential component of many speech and language processing systems. The development of automatic speech recognizers frequently requires working with subword units such as syllables. More importantly, syllabification is an inevitable part of speech synthesis system. In this paper we present data-driven approaches to supervised learning and automatic detection of syllable bo...
متن کاملLetter-to-Phoneme Conversion for a German Text-to-Speech System
This thesis deals with the conversion from letters to phonemes, syllabification and word stress assignment for a German text-to-speech system. In the first part of the thesis (chapter 5), several alternative approaches for morphological segmentation are analysed and the benefit of such a morphological preprocessing component is evaluated with respect to the grapheme-to-phoneme conversion algori...
متن کاملThe Interaction of Stress and Syllabification: Serial or Parallel?
It is a standard assumption in phonology that stress is assigned to syllables. It would seem to be common sense, then, that syllabification precedes stress assignment. But, as we will see, common sense is wrong in this particular case. Based on evidence from Finnish, we will show that syllabification cannot strictly precede stress assignment, but that the two happen in parallel, as in classical...
متن کامل